Optical Character Recognition (OCR)
Introduction
OCR works while files are being ingested, converting image files into text documents for analysis in Sintelix, as illustrated below:
Compatible Formats
OCR can convert documents in compatible formats including jpeg, png, gif, tiff, bmp, and scanned pdf.
Requirements
OCR capability requires:
- OCR enabled on your activation key
- Sintelix to be connected to an installed OCR server (see Connect OCR and Install Optical Character Recognition (OCR))
- OCR enabled in the Ingestion Configuration (see Optical Character Recognition (OCR) Processing).
Using the OCR feature:
Once OCR has been enabled in the Ingestion Configuration, it is automatically applied to any image files added to a collection for ingestion.
Once the document has been processed, you can edit/modify the text and export the document just like any other document.
Forms:
When ingesting forms, you can:
-
select the option to mark fields for documents with forms in the OCR Ingestion options, or
-
define PDF form configuration specifying how form fields are to be marked up in the document. The PDF form configuration can then be added to the Ingestion Configuration (see PDF Form Ingestion).